[WIP] testing & testability improvements#9866
Open
ReubenBond wants to merge 6 commits intodotnet:mainfrom
Open
[WIP] testing & testability improvements#9866ReubenBond wants to merge 6 commits intodotnet:mainfrom
ReubenBond wants to merge 6 commits intodotnet:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces comprehensive testing infrastructure improvements for Orleans by adding diagnostic event collection, FakeTimeProvider integration for deterministic time control, and event-driven waiting patterns. The changes enable faster, more reliable tests by replacing polling/sleep-based waiting with event-driven approaches and virtual time control.
Key Changes:
- Added diagnostic observer infrastructure (GrainDiagnosticObserver, TimerDiagnosticObserver, ReminderDiagnosticObserver, etc.) for event-driven test waiting
- Integrated FakeTimeProvider across test infrastructure for deterministic time control in timer/reminder tests
- Replaced Thread.Sleep/Task.Delay polling patterns with event-driven waiting throughout test suite
- Enhanced LeaseBasedQueueBalancer with diagnostic event emission for streaming tests
Reviewed changes
Copilot reviewed 87 out of 87 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/TestInfrastructure/TestExtensions/*DiagnosticObserver.cs | New diagnostic observer classes for event-driven test waiting |
| src/Orleans.TestingHost/Logging/InMemoryLoggerProvider.cs | New in-memory logging infrastructure for test log capture |
| test/TesterInternal/TimerTests/ReminderTests_*.cs | Converted to FakeTimeProvider + event-driven waiting |
| test/TesterInternal/ActivationsLifeCycleTests/*.cs | Replaced Task.Delay with FakeTimeProvider and event-driven deactivation waiting |
| test/DefaultCluster.Tests/TimerOrleansTest.cs | Replaced polling loops with event-driven timer tick waiting |
| src/Orleans.Streaming/QueueBalancer/LeaseBasedQueueBalancer.cs | Added diagnostic event emission for queue balancer changes |
| src/Orleans.Runtime/Timers/AsyncTimerFactory.cs | Integrated TimeProvider for testable timer creation |
| test/Grains/*/PlacementTestGrain.cs | Added methods for deterministic overload detector testing |
| test/**/LeaseBasedQueueBalancer.cs | Fixed race condition with proper two-phase latching pattern |
1ae919e to
41f4211
Compare
1c982a1 to
9f72802
Compare
Add structured DiagnosticListener/DiagnosticSource event definitions covering: - Grain lifecycle (Created, Activated, Deactivating, Deactivated) - Silo/Client lifecycle (StageStarting/Completed/Failed, ObserverStarting/Completed/Failed) - Membership (SiloStatusChanged, ViewChanged, SiloSuspected, SiloDeclaredDead) - Placement (StatisticsPublished/Received, ClusterStatisticsRefreshed) - Rebalancer (CycleStart/Stop, SessionStart/Stop) - Reminders (Registered, Unregistered, TickFiring/Completed/Failed) - Timers (TickStart/Stop, Created, Disposed) - Streaming (MessageDelivered, StreamInactive, SubscriptionAdded/Removed, QueueLeases) Each event uses typed record payloads emitted via DiagnosticListener.Write() behind IsEnabled() guards. These enable deterministic test waiting and advanced diagnostics scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DiagnosticListener-based event emission to core runtime components: - SiloLifecycleSubject: lifecycle stage and observer start/stop/fail events - MembershipTableManager: silo status changes, view changes, join/active/dead - DeploymentLoadPublisher: statistics published/received/refreshed/removed - GrainTimer: timer tick start/stop, created, disposed All events are guarded by IsEnabled() checks to avoid overhead when no listener is subscribed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Inject TimeProvider into AsyncTimer and AsyncTimerFactory, replacing direct DateTime.UtcNow and Task.Delay calls with TimeProvider-based alternatives. This enables FakeTimeProvider usage in tests for deterministic timing control. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add test infrastructure for event-driven test waiting: Test observers (in TestExtensions): - GrainDiagnosticObserver: wait for grain created/activated/deactivated counts - MembershipDiagnosticObserver: wait for silo status changes - PlacementDiagnosticObserver: wait for statistics propagation - RebalancerDiagnosticObserver: wait for rebalancing cycle counts - ReminderDiagnosticObserver: wait for reminder tick counts - TimerDiagnosticObserver: wait for timer tick counts Test utilities (in Orleans.TestingHost): - DiagnosticEventCollector: generic listener for any DiagnosticSource - InMemoryLoggerProvider: log capture with TimeProvider support Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instrument LocalReminderService with diagnostic events: - Registered/Unregistered on reminder lifecycle - TickFiring/TickCompleted/TickFailed around reminder callbacks - Inject TimeProvider for deterministic timing Instrument streaming pipeline with diagnostic events: - PersistentStreamPullingAgent: MessageDelivered, StreamInactive - LeaseBasedQueueBalancer: QueueBalancerChanged, QueueLeasesAcquired/Released - Thread TimeProvider through PersistentStreamPullingManager - StreamConsumerCollection: accept explicit timestamp parameter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update all AsyncTimerFactory constructor calls in membership tests to pass TimeProvider.System as the new required parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1616890 to
9a6f6b9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(this is just a wip/experiment at this stage. Opening a PR just to run CI)
This pull request introduces a new set of diagnostic event definitions for Orleans, covering grain lifecycle, membership, placement, and silo/client lifecycle events. These changes provide a structured way for advanced users to observe and react to important internal events in Orleans clusters, primarily for diagnostics, monitoring, and simulation testing scenarios. The additions include public static classes for event names, listener names, and strongly-typed event payload records for each diagnostic area.
The most important changes are:
Grain Diagnostics
OrleansGrainDiagnosticsclass with listener and event names for grain activation lifecycle events, along with corresponding payload records (GrainCreatedEvent,GrainActivatedEvent,GrainDeactivatingEvent,GrainDeactivatedEvent).Lifecycle Diagnostics
OrleansLifecycleDiagnosticsclass for silo and client lifecycle events, including event names for stage and observer transitions, and detailed event payload records (such asLifecycleStageStartingEvent,LifecycleObserverFailedEvent, etc.).Membership Diagnostics
OrleansMembershipDiagnosticsclass for cluster membership events, providing event names and payload records for silo status changes, membership view changes, suspicions, and cluster join/leave events.Placement and Load Statistics Diagnostics
OrleansPlacementDiagnosticsclass for placement and silo load statistics events, with event names and payload records for statistics publication, reception, cluster-wide refresh, and removal.Microsoft Reviewers: Open in CodeFlow